A Functional Programming Approach to Distance-based Machine Learning
نویسندگان
چکیده
Distance-based algorithms for both clustering and prediction are popular within the machine learning community. These algorithms typically deal with attributevalue (single-table) data. The distance functions used are typically hard-coded. We are concerned here with generic distance-based learning algorithms that work on arbitrary types of structured data. In our approach, distance functions are not hard-coded, but are rather first-class citizens that can be stored, retrieved and manipulated. In particular, we can assemble, on-the-fly, distance functions for complex structured data types from pre-existing components. To implement the proposed approach, we use the strongly typed functional language Haskell. Haskell allows us to explicitly manipulate distance functions. We have produced a SW library/application with structured data types and distance functions and used it to evaluate the potential of Haskell as a basis for future work in the field of distancebased machine learning. 1. General Framework for Data Mining A general framework for data mining should elegantly handle different types of data, different data mining tasks, and different types of patterns/models. Dzeroski (2007) proposes such a framework, which explicitly considers different types of structured data and socalled generic learning algorithms that work on arbitrary types of structured data. The basic components of different types of such algorithms (such as distance or kernel-based ones) are discussed. Taking the inductive database (Imielinski and Mannila 1996) philosophy that proposes that patterns/models are first-class citizens that can be stored and manipulated, Dzeroski proposes to store and manipulate basic components of data mining algorithms, such as distance functions.
منابع مشابه
Two-stage fuzzy-stochastic programming for parallel machine scheduling problem with machine deterioration and operator learning effect
This paper deals with the determination of machine numbers and production schedules in manufacturing environments. In this line, a two-stage fuzzy stochastic programming model is discussed with fuzzy processing times where both deterioration and learning effects are evaluated simultaneously. The first stage focuses on the type and number of machines in order to minimize the total costs associat...
متن کاملA Possibility Linear Programming Approach to Solve a Fuzzy Single Machine Scheduling Problem
This paper employs an interactive possibility linear programming approach to solve a single machine scheduling problem with imprecise processing times, due dates, as well as earliness and tardiness penalties of jobs. The proposed approach is based on a strategy of minimizing the most possible value of the imprecise total costs, maximizing the possibility of obtaining a lower total costs, and mi...
متن کاملA new approach to fuzzy quantities ordering based on distance method and its applications for solving fuzzy linear programming
Many ranking methods have been proposed so far. However, there is yet no method that can always give a satisfactory solution to every situation; some are counterintuitive, not discriminating; some use only the local information of fuzzy values; some produce different ranking for the same situation. For overcoming the above problems, we propose a new method for ranking fuzzy quantities based on ...
متن کاملMachine Reliability in a Dynamic Cellular Manufacturing System: A Comprehensive Approach to a Cell Layout Problem
The fundamental function of a cellular manufacturing system (CMS) is based on definition and recognition of a type of similarity among parts that should be produced in a planning period. Cell formation (CF) and cell layout design are two important steps in implementation of the CMS. This paper represents a new nonlinear mathematical programming model for dynamic cell formation that employs the ...
متن کاملA New Hybrid Meta-Heuristics Approach to Solve the Parallel Machine Scheduling Problem Considering Human Resiliency Engineering
This paper proposes a mixed integer programming model to solve a non-identical parallel machine (NIPM) scheduling with sequence-dependent set-up times and human resiliency engineering. The presented mathematical model is formulated to consider human factors including Learning, Teamwork and Awareness. Moreover, processing time of jobs are assumed to be non-deterministic and dependent to their st...
متن کامل